Combining a Gaussian mixture model front end with MFCC parameters

نویسندگان

  • Matthew N. Stuttle
  • Mark J. F. Gales
چکیده

Fitting a Gaussian mixture model (GMM) to the smoothed speech spectrum allows an alternative set of features to be extracted from the speech signal. These features have been shown to possess information complementary to the standard MFCC parameterisation. This paper further investigates the use of these GMM features in combination with MFCCs. The extraction and use of a confidence metric to combine GMM features with MFCCs is described. Results using the confidence metric on the WSJ task are presented. Also, GMM features for speech corrupted with additive noise are extracted from data corrupted with coloured addititve noise. Techniques for noise robustness and compensation are investigated for GMM features and the performance is examined on the RM task with additive noise.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Combining evidences from mel cepstral, cochlear filter cepstral and instantaneous frequency features for detection of natural vs. spoofed speech

Speech synthesis and voice conversion techniques can pose threats to current speaker verification (SV) systems. For this purpose, it is essential to develop front end systems that are able to distinguish human speech vs. spoofed speech (synthesized or voice converted). In this paper, for the ASVspoof 2015 challenge, we propose a detector based on combination of cochlear filter cepstral coeffici...

متن کامل

A mixture of Gaussians front end for speech recognition

This paper describes a feature extraction technique based on fitting a Gaussian mixture model (GMM) to the speech spectral envelope. The features obtained (the component means, variances and priors) represent both the the general shape of the spectrum and provide information on the position of the spectral peaks. As the features select peaks in the spectrum they are related to the formant ampli...

متن کامل

Combining Gaussian Mixture Models and Segmental Feature Models for Speaker Recognition

In most speaker recognition systems speech utterances are not constrained in content or language. In a text-dependent speaker recognition system lexical content of speech and language are known in advance. The goal of this paper is to show that this information can be used by a segmental features (SF) approach to improve a standard Gaussian mixture model with MFCC features (GMM-MFCC). Speech fe...

متن کامل

Emotion recognition in spontaneous speech using GMMs

Automatic detection of emotions has been evaluated using standard Mel-frequency Cepstral Coefficients, MFCCs, and a variant, MFCC-low, calculated between 20 and 300 Hz, in order to model pitch. Also plain pitch features have been used. These acoustic features have all been modeled by Gaussian mixture models, GMMs, on the frame level. The method has been tested on two different corpora and langu...

متن کامل

Emotion Recognition in Spontaneous Speech

Automatic detection of emotions has been evaluated using standard Mel-frequency Cepstral Coefficients, MFCCs, and a variant, MFCC-low, that is calculated between 20 and 300 Hz in order to model pitch. Plain pitch features have been used as well. These acoustic features have all been modeled by Gaussian mixture models, GMMs, on the frame level. The method has been tested on two different corpora...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002